Non-stationary domains, that change in unpredicted ways, are a challenge foragents searching for optimal policies in sequential decision-making problems.This paper presents a combination of Markov Decision Processes (MDP) withAnswer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), whichis a method capable of constructing the set of domain states while the agentinteracts with a changing environment. oASP(MDP) updates previously obtainedpolicies, learnt by means of Reinforcement Learning (RL), using rules thatrepresent the domain changes observed by the agent. These rules represent a setof domain constraints that are processed as ASP programs reducing the searchspace. Results show that oASP(MDP) is capable of finding solutions for problemsin non-stationary domains without interfering with the action-value functionapproximation process.
展开▼
机译:非平稳域的变化方式无法预测,这对于代理商在顺序决策问题中寻找最优策略是一个挑战。本文提出了马尔可夫决策过程(MDP)与答案集编程(ASP)的结合,名为{\ em Online用于MDP的ASP}(oASP(MDP)),这是一种能够在代理与不断变化的环境交互时构造域状态集的方法。 oASP(MDP)使用代表代理观察到的域更改的规则更新通过强化学习(RL)学习的先前获得的策略。这些规则表示一组域约束,这些域约束作为ASP程序进行处理以减少搜索空间。结果表明,oASP(MDP)能够在非平稳域中找到问题的解决方案,而不会干扰动作值函数的逼近过程。
展开▼